Miss Penalty Reduction Using Bundled Capacity Prefetching in Multiprocessors
نویسندگان
چکیده
While prefetch has proven itself useful for reducing cache misses in multiprocessors, traffic is often increased due to extra unused prefetch data. Prefetching in multiprocessors can also increase the cache miss rate due to the false sharing caused by the larger pieces of data retrieved. The capacity prefetching strategy proposed in this paper is built on the assumption that prefetching is most beneficial for reducing capacity and cold misses, but not communication misses. We propose a simple scheme for detecting the most frequent communication misses and suggest that prefetching should be avoided for those. We also suggest a simple and effective strategy for reducing the address traffic while retrieving many sequential cache lines called bundling. In order to demonstrate the effectiveness of these approaches, we have evaluated both strategies for one of the simplest forms of prefetching, sequential prefetching. The two new strategies applied to this bandwidth-hungry prefetch technique result in a lower miss rate for all studied applications, while the average amount of address traffic is reduced compared with the same application run with no prefetching. The proposed strategies could also be applied to more sophisticated prefetching techniques for better overall performance.
منابع مشابه
A Preliminary Evaluation of Cache-miss-initiated Prefetching Techniques in Scalable Multiprocessors
Prefetching is an important technique for reducing the average latency of memory accesses in scalable cache-coherent multiprocessors. Aggressive prefetching can signiicantly reduce the number of cache misses, but may introduce bursty network and memory traac, and increase data sharing and cache pollution. Given that we anticipate enormous increases in both network bandwidth and latency, we exam...
متن کاملSequential Hardware Prefetching in Shared-Memory Multiprocessors
To offset the effect of read miss penalties on processor utilization in shared-memory multiprocessors, several softwareand hardware-based data prefetching schemes have been proposed. A major advantage of hardware techniques is that they need no support from the programmer or compiler. Sequential prefetching is a simple hardware-controlled prefetching technique which relies on the automatic pref...
متن کاملCache Injection on Bus Based Multiprocessors
Software-controlled cache prefetching and data forwarding are widely used techniques for tolerating memory latency in shared memory multiprocessors. However, some previous studies show that cache prefetching is not so effective on bus-based multiprocessors, while the effectiveness of data forwarding has not been explored in this environment, yet. In this paper, a novel technique called cache in...
متن کاملAn adaptive sequential prefetching scheme in shared-memory multiprocessors
The sequential prefetching scheme is a simple hardwarecontrolled scheme, which exploits the sequentiality of memory accesses to predict which blocks will be read in the near future. We analyze the relationship between the sequentiality of application programs and the effectiveness of sequential prefetching on shared-memory multiprocessors. Also, we propose a simple hardware scheme which selects...
متن کاملBranch History Guided Instruction Prefetching
Instruction cache misses stall the fetch stage of the processor pipeline and hence affect instruction supply to the processor. Instruction prefetching has been proposed as a mechanism to reduce instruction cache (I-cache) misses. However, a prefetch is effective only if accurate and initiated sufficiently early to cover the miss penalty. This paper presents a new hardware-based instruction pref...
متن کامل